Align-Anything Chameleon 7B Plus
Introduction
Repository for Align-Anything Chameleon 7B Plus, a powerful model for text-image interleaved input and output, with further alignment by Align-Anything algorithm. This model is based on the Chameleon model, and is trained and aligned on the Align-Anything framework to further unlock its capability of image generation, and improve the alignment ability towards human preferences.
Usage
To use this model, you can refer to the Align-Anything repository for more details, including the training, inference and evaluation:
git clone https://github.com/PKU-Alignment/align-anything.git
cd align-anything/projects/text_image_to_text_image
Then follow the instructions in the README.md file to set up the environment and run the scripts.
Currently, the official Transformer repo does not support Chameleon model with image output (see this PR for more details), so we rely on a certain fork of the repo.
After installing Align-Anything and correctly set up the envrionment, you can install the forked stable version of the repo by running:
pip install git+https://github.com/htlou/transformers.git@hantao_stable_cham
If you want to generate image (pure text generation can be directly done by Transformers), you can follow the instructions in the mmsg_chameleon repo to run the inference.
git clone https://github.com/htlou/mmsg_chameleon.git
cd mmsg_chameleon
Then set up the envrionment using
pip install -e . 
After setting up the envrioment, set up the correct paths in scripts/interleaved_gen.sh and then run
bash scripts/interleaved_gen.sh
- Downloads last month
- 51
